This notebook shows how the data for modeling waitlist mortality is generated.
TODO: 1. Combine candidate data with waitlist ranges to see if wide/narrow offer consideration impacts survival (e.g., only consider donors < 200mi)
Changes: 1. Added Body Surface Area (BSA) 2. Updated Center Stats Calculations
Settings and Functions
Code
plot_data <-function(data, var, n =12){# n is the maximum number of bins/levels to use in plot y =1-data$outcome # set survival as outcome of interest x = data %>%pull({{var}})if(all(is.na(x))) return(ggplot()) k =5# shrinkage prior mu =mean(y)if(n_distinct(x) <= {{n}}) x =factor(x)if(is.character(x)) x =factor(x)if(is.factor(x)) x =fct_lump_n(x, n = {{n}})if(is.numeric(x) | lubridate::is.POSIXt(x) | lubridate::is.Date(x)){ brks = ggplot2:::breaks(x, "numbers", nbins = {{n}}) %>%unique()#x = cut(x, c(-Inf,brks)) x =cut(x, brks, include.lowest =TRUE) } x =fct_na_value_to_level(x) # convert NAs to explicit leveltibble(x, y) %>%group_by(x) %>%summarize(n =n(), # number in bin/leveln1 =sum(y >0), # number of outcomes of interest in bin/levelp = (n1 + k*mu)/(n + k), # laplace smoothing estimatese =sqrt(p*(1-p)/n), # confidence intervals based on laplace lower =pmax(0, p -2*se),# approx 95% intervalsupper =pmin(1, p +2*se) ) %>%ggplot(aes(x, p)) +geom_hline(yintercept =mean(y), lty =3) +geom_errorbar(aes(ymin=lower, ymax=upper)) +geom_point() +# labs(x = xlab) + # scale_x_discrete(guide = guide_axis(n.dodge = 2))scale_x_discrete(labels = scales::label_wrap(10)) +labs(x =sym(var), y ="Waitlist Survival") +coord_cartesian(ylim =c(.80, 1))}
Get Population
Thoracic Data
The thoracic data has information collected on transplant candidates and recipients. The data was retrieved from UNOS on 2022-04-01.
The REM_CD is the coded reason for removal from the waitlist. The details are in the STAR documentation.
The created outcome variable corresponds to an adverse outcome defined as death (REM_CD = 8) or Candidate too sick to transplant (REM_CD = 13, “Cand. cond. deteriorated,too sick to tx”).
outcome = 1 is adverse.
outcome = 0 is everything else (i.e., candidate was removed from the waitlist for anything besides death or deteriorization).
Code
REM_CD_code = readxl::read_excel(file.path(dir_data, "STAR File Documentation.xls"),sheet ="THORACIC_FORMATS_FLATFILE",skip=1) %>%filter(`SAS ANALYSIS FORMAT`=="REMCD") %>%select(REM_CD =`Data Field Value`,REMOVAL_REASON =`Data Field Formatted Value` ) %>%mutate(outcome = 1L*(REM_CD %in%c("8", "13")),outcome_full =case_match(REM_CD,#: adverse outcomes"8"~"Death","13"~"Too sick", # Cand. cond. deteriorated,too sick to tx#: Still alive or Tx'edc("2", "3", "4", "14", "15", "21", "23") ~"Tx","6"~"Alive", # Refused transplant"7"~"Alive", # Transferred to another center "12"~"Alive", # Cand. condition improved, tx not needed"16"~"Alive", # Candidate Removed in Error#: other"10"~"Listed in Error","24"~"Lost contact",.default ="Other" ) )
Population
1. Pediatric Candidate (Age < 18) on waitlist for heart
There are 6244 pediatric waitlists between 2010-01-01 and 2020-12-31. There were 5972 unique candidates (since some were on multiple waitlists during this period).
3. No prior heart transplants
We can discuss allowing these candidates. My initial reason for exclusion is keep things simple. Otherwise, we’d need to keep track of the number of past transplants and the time from the last transplant (some are only a few days).
Code
thor3 = thor2 %>%filter( NUM_PREV_TX ==0, !(PREV_TX %in%"Y"),!(THORACIC_DGN %in%c(1700, 1100:1199)) # codes related to RE-TX/GF )
This removed 399 and leaves 5845.
4. Patient’s first heart waitlist
Keep only the data from the first time a patient was listed. This will exclude the waitlists corresponding to patients i) who had previous heart transplants, ii) are listed multiple times, and iii) were transferred to another center.
We could keep some of these, but handling the multiple waitlists and transfers becomes a little tricky.
Code
first_wl = thoracic %>%group_by(PT_CODE) %>%slice_min(INIT_DATE, with_ties =FALSE) %>%ungroup()# For more precise identification of first waitling list, use the # waitlist data: WL_PT %>% filter(seq_wl == 1)thor4 = thor3 %>%semi_join(first_wl, by ="WL_ID_CODE")
This removed 176 and leaves 5669.
5. Remove waitlists with mistakes
The removal code corresponding to listing mistakes (REM_CD = 10 “Candidate listed in error”) are removed from the data.
Code
thor5 = thor4 %>%filter( !(REM_CD %in%c("10")) )
This removed 0 and leaves 5669.
6. Remove waitlists with unknown outcomes
The removal codes corresponding to candidates lost (REM_CD = 24 “Unable to contact candidate”) and candidates still alive at data collection/censored (REM_CD = NA) are removed from the data.
DATA_WL = WL %>%filter(CHG_TY =="A") %>%# collect data when added to waitlistsemi_join(population, by ="WL_ID_CODE") %>%select( WL_ID_CODE, # ORG, # STATUS, WL_DT = CHG_DT, # Date-time candidate first added to waitlist PRELIM_XMATCH_REQ, # INACT_REASON_CD, # not many casesstarts_with("DONCRIT"), -DONCRIT_ACPT_HIST_CIG, # all missing-DONCRIT_GENDER_REQ, # not many gender specified ) %>%mutate(PRELIM_XMATCH_REQ =recode_YNU(PRELIM_XMATCH_REQ), across(starts_with("DONCRIT_ACPT"), recode_YNU) )
EDA: Waitlist Data
Code
model_data = DATA_WL %>%full_join(population, by ="WL_ID_CODE")vars =colnames(model_data) %>%setdiff(colnames(population))walk(vars, ~plot_data(model_data, var = .x) %>% print)
1203: CONGENITAL HEART DEFECT - PRIOR SURGERY UNKNOWN
Other
7
1
0.885
1002: DILATED MYOPATHY: POST PARTUM
Dilated Cardiomyopathy
8
1
0.893
1004: DILATED MYOPATHY: MYOCARDITIS
Myocarditis
180
19
0.904
999: OTHER - SPECIFY
Dilated Cardiomyopathy
32
3
0.912
1005: DILATED MYOPATHY: ALCOHOLIC
Dilated Cardiomyopathy
1
0
0.917
1204: CANCER
Other
1
0
0.917
1099: RESTRICTIVE MYOPATHY: OTHER SPECIFY
Restrictive Cardiomyopathy
48
4
0.921
1052: RESTRICTIVE MYOPATHY: ENDOCARDIAL FIBROS
Restrictive Cardiomyopathy
2
0
0.929
1001: DILATED MYOPATHY: ADRIAMYCIN
Dilated Cardiomyopathy
42
3
0.930
1201: HYPERTROPHIC CARDIOMYOPATHY
Hypertrophic Cardiomyopathy
174
12
0.935
Other
3
0
0.938
1049: DILATED MYOPATHY: OTHER SPECIFY
Dilated Cardiomyopathy
280
17
0.942
1050: RESTRICTIVE MYOPATHY: IDIOPATHIC
Restrictive Cardiomyopathy
233
14
0.942
1209: MUSCULAR DYSTROPHY: OTHER SPECIFY
Dilated Cardiomyopathy
5
0
0.950
1000: DILATED MYOPATHY: IDIOPATHIC
Dilated Cardiomyopathy
1178
58
0.953
1006: DILATED MYOPATHY: VIRAL
Myocarditis
26
1
0.953
1200: CORONARY ARTERY DISEASE
Other
10
0
0.967
1003: DILATED MYOPATHY: FAMILIAL
Dilated Cardiomyopathy
191
6
0.968
1208: ARRHYTHMOGENIC RIGHT VENTRICULAR DYSPLASIA/CARDIOMYOPATHY
Other
19
0
0.979
1007: DILATED MYOPATHY: ISCHEMIC
Dilated Cardiomyopathy
31
0
0.986
Center Level Predictors
Listing Center Volume
Calculate the number of pediatric heart transplants each year by listing center
Code
#: Number of pediatric hr transplants in previous yearTX_YR = thoracic %>%filter(coalesce(AGE, INIT_AGE) <18, # Age of transplant recipient ORGAN =="HR" ) %>%count(LISTING_CTR_CODE, YR = TX_YEAR) %>%complete(LISTING_CTR_CODE, YR, fill =list(n=0)) %>%group_by(LISTING_CTR_CODE) %>%arrange(YR) %>%mutate(LISTING_CTR_PEDHRTX_PREV_YR = dplyr::lag(n, default=0) ) %>%ungroup() %>%select(-n)
Listing Center Practice
NOTE: These are averaged over 2010-2019. This can make things a bit misleading as centers that did good will continue to have success.
median_refusals: median number of offer refusals per candidate. Averaged over 2010-2019 by listing center
median_wait_days: median number of days candidate needs to wait until first offer. Averaged over 2010-2019 by listing center.
median_wait_days_1A: median number of days that a Status 1A candidate needs to wait until first offer. Averaged over 2010-2019 by listing center. This will help standardize over centers that have disproportionate number of Status 2’s (for example).
median_wait_days_status: median number of days candidate needs to wait until first offer (by listing status; 1A will wait less than 1B). Averaged over 2010-2019 by listing center